2025-12-02
Can all be done with Gaussian distributions!
If we have a vector of random variables \(\mathbf{x}\) and
\[ \mathbf{x} \sim \mathcal{N}_d(\boldsymbol{\mu}, \mathbf{\Sigma}) \]
then, the joint probability mass of \(\mathbf{x}\) is given by the multivariate normal:
\[ p\left( \mathbf{x} \,|\, \boldsymbol{\mu}, \mathbf{\Sigma} \right) \propto \exp \left\{ -\frac{1}{2} (\mathbf{x}-\boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{x}-\boldsymbol{\mu}) \right\} \]
A two-dimensional MVN with \(\boldsymbol{\mu}=[0,0]\) and \(\Sigma=\begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix}\):
# mean vector
mu <- c(0, 0)
# covariance matrix
Sigma <- matrix( c(1, 0.7,
0.7, 1), nrow = 2)
# grid of x1 and x2 values
x1x2_grid <- expand.grid(x1 = seq(-3, 3, length.out = 100),
x2 = seq(-3, 3, length.out = 100))
# probability contours
probabilities <- dmvnorm(x1x2_grid, mean = mu, sigma = Sigma)
Probability contour plot:
Probability contour plot:
We can condition on one of the variables, \(p(x_2 \,|\, x_1,\,\Sigma)\)
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1 & 0.7\\ 0.7 & 1 \end{bmatrix} \]
\[ \mathbf{\Sigma} = \begin{bmatrix} 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 & 0.01 & 0.00 & 0.00 \\ 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 & 0.01 & 0.00 \\ 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 & 0.01 \\ 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 & 0.03 \\ 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 & 0.08 \\ 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 & 0.20 \\ 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 & 0.41 \\ 0.01 & 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 & 0.67 \\ 0.00 & 0.01 & 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 & 0.90 \\ 0.00 & 0.00 & 0.01 & 0.03 & 0.08 & 0.20 & 0.41 & 0.67 & 0.90 & 1.00 \end{bmatrix} \]
In a simple example, imagine we are given two data points \(\{(x_1, y_1); (x_2, y_2)\}\). - We need to predict the function thereafter
A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables.
A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
As a distribution over functions, a Gaussian process is completely specified by two functions:
\[ f(\mathbf{x}) \sim \mathcal{GP}\big(\, m(\mathbf{x}), k(\mathbf{x}, \mathbf{x^*}) \, \big) \]
Generative model
\[ y(\mathbf{x}) = f(\mathbf{x}) \Big[ + \epsilon\sigma_y \Big]\\ p(\epsilon)=\mathcal{N}(0,1) \]
Place GP prior over the nonlinear function (mean function often taken as 0).
\[ p(f(\mathbf{x}) \, | \, \theta) = \mathcal{GP}\big(0, k(\mathbf{x}, \mathbf{x^*})\big)\\ k(\mathbf{x}, \mathbf{x^*}) = \sigma^2 \exp \left\{ -\frac{1}{2\ell^2}(x-x^*)^2 \right\} \]
\[ p(y_1, y_2) = \mathcal{N} \left( \begin{bmatrix} \boldsymbol{\mu_1\\ \mu_2} \end{bmatrix}, \begin{bmatrix} \mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12}\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{bmatrix}\right) \\ p(\mathbf{y}_1 \,|\, \mathbf{y}_2) = \frac{p(\mathbf{y}_1, \mathbf{y}_2)}{p(\mathbf{y}_2)} \]
With some involved proving: